Asynchronous COMID: the theoretic basis for transmitted data sparsification tricks on Parameter Server
نویسندگان
چکیده
Asynchronous FTRL and L2 norm done at server are two widely used tricks to improve training efficiency, but their convergences are not well-proved. In this paper, we propose asynchronous COMID algorithm and prove its convergence. Then, we establish the equivalence between asynchronous COMID and the above two tricks. Thus, the convergences of above two tricks are also proved. Experimental results show asynchronous COMID reduces the burden of the network without any harm on the convergence speed and final output. INTRODUCTION There are a lot of tricks in machine learning application to get higher training efficiency, better classification accuracy and the ability of solving unconvex optimization. Some of them are reasonable and well-proved, like setting a better initial model parameters to reduce training time. But most of other tricks are lack of proof. Most of those tricks can only be used suitably depending on users’ experience, like deciding the size of batch and constructing a DNN. In a real situation, the majority of tricks are proved by experiments instead of rigorous mathematical proofs. Nowadays, Parameters Server frame, based on delayed SGD algorithms, is the most popular learning frame. However, with the increasing number of workers, the burden of network would be unaffordable. Asynchronous FTRL and addressing L2 norm on server are two widely used tricks to solve this problem, but they are not rigorously proved. Hereafter, these two tricks will be abbreviated as asynch-FTRL and L2 norm trick. In this paper, we show the proof of Asynchronous Composite Objective MIrror Descent, abbr. asynch-COMID. Based on asynch-COMID, we establish the equivalence between asynchronous COMID and the above two tricks. Thus, the convergences of above two tricks are also proved. We fill these gaps between application and theory of above two tricks via asynch-COMID. Machine learning and stochastic optimization Training process in machine learning can be treated as solving the stochastic optimization problem. The objective functions are the mathematical expectation of loss functions which contain random variable. The random variables satisfy with known distribution. In a real situation, we use the frequency to approximate the product of probability density *Corresponding author density(x) and ∆x, as frequency histogram can roughly estimate the curve of probability density function. So, for a dataset, the above formula can be written as the following form: minE[g(X,w)](X ∼ certain distribution D) = min ∫ g(x,w)density(x)∆x
منابع مشابه
Decision-Theoretic Sparsification for Gaussian Process Preference Learning
We propose a decision-theoretic sparsification method for Gaussian process preference learning. This method overcomes the lossinsensitive nature of popular sparsification approaches such as the Informative Vector Machine (IVM). Instead of selecting a subset of users and items as inducing points based on uncertainty-reduction principles, our sparsification approach is underpinned by decision the...
متن کاملParameter-free Network Sparsification and Data Reduction by Minimal Algorithmic Information Loss
The study of large and complex datasets, or big data, organized as networks has emerged as one of the central challenges in most areas of science and technology. Cellular and molecular networks in biology is one of the prime examples. Henceforth, a number of techniques for data dimensionality reduction, especially in the context of networks, have been developed. Yet, current techniques require ...
متن کاملWeb-scale Topic Models in Spark: An Asynchronous Parameter Server
In this paper, we train a Latent Dirichlet Allocation (LDA) topic model on the ClueWeb12 data set, a 27-terabyte Web crawl. We extend Spark, a popular tool for performing large-scale data analysis, with an asynchronous parameter server. Such a parameter server provides a distributed and concurrently accessed parameter space for the model. A Metropolis-Hastings based collapsed Gibbs sampler is i...
متن کاملA Novel Method for VANET Improvement using Cloud Computing
In this paper, we present a novel algorithm for VANET using cloud computing. We accomplish processing, routing and traffic control in a centralized and parallel way by adding one or more server to the network. Each car or node is considered a Client, in such a manner that routing, traffic control, getting information from client and data processing and storing are performed by one or more serve...
متن کاملAnalysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server
This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly non-smooth) regularizers and general convex constraints. When the empirical data loss is strongly convex, we establish linear convergence rate, give explicit expr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.02091 شماره
صفحات -
تاریخ انتشار 2017